259 research outputs found
Hierarchy Flow For High-Fidelity Image-to-Image Translation
Image-to-image (I2I) translation comprises a wide spectrum of tasks. Here we
divide this problem into three levels: strong-fidelity translation,
normal-fidelity translation, and weak-fidelity translation, indicating the
extent to which the content of the original image is preserved. Although
existing methods achieve good performance in weak-fidelity translation, they
fail to fully preserve the content in both strong- and normal-fidelity tasks,
e.g. sim2real, style transfer and low-level vision. In this work, we propose
Hierarchy Flow, a novel flow-based model to achieve better content preservation
during translation. Specifically, 1) we first unveil the drawbacks of standard
flow-based models when applied to I2I translation. 2) Next, we propose a new
design, namely hierarchical coupling for reversible feature transformation and
multi-scale modeling, to constitute Hierarchy Flow. 3) Finally, we present a
dedicated aligned-style loss for a better trade-off between content
preservation and stylization during translation. Extensive experiments on a
wide range of I2I translation benchmarks demonstrate that our approach achieves
state-of-the-art performance, with convincing advantages in both strong- and
normal-fidelity tasks. Code and models will be at
https://github.com/WeichenFan/HierarchyFlow.Comment: arXiv admin note: text overlap with arXiv:2207.0190
FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Convolutional Neural Networks (CNNs) demonstrate excellent performance in
various applications but have high computational complexity. Quantization is
applied to reduce the latency and storage cost of CNNs. Among the quantization
methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique
advantage over 8-bit and 4-bit quantization. They replace the multiplication
operations in CNNs with additions, which are favoured on In-Memory-Computing
(IMC) devices. IMC acceleration for BWNs has been widely studied. However,
though TWNs have higher accuracy and better sparsity than BWNs, IMC
acceleration for TWNs has limited research. TWNs on existing IMC devices are
inefficient because the sparsity is not well utilized, and the addition
operation is not efficient.
In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we
propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to
skip the null operations on zero weights. Second, we propose a fast addition
scheme based on the memory Sense Amplifier to avoid the time overhead of both
carry propagation and writing back the carry to memory cells. Third, we further
propose a Combined-Stationary data mapping to reduce the data movement of
activations and weights and increase the parallelism across memory columns.
Simulation results show that for addition operations at the Sense Amplifier
level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area
efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT
achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on
networks with 80% average sparsity.Comment: 14 page
Link-Context Learning for Multimodal LLMs
The ability to learn from context with novel concepts, and deliver
appropriate responses are essential in human conversations. Despite current
Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being
trained on mega-scale datasets, recognizing unseen images or understanding
novel concepts in a training-free manner remains a challenge. In-Context
Learning (ICL) explores training-free few-shot learning, where models are
encouraged to ``learn to learn" from limited tasks and generalize to unseen
tasks. In this work, we propose link-context learning (LCL), which emphasizes
"reasoning from cause and effect" to augment the learning capabilities of
MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal
relationship between the support set and the query set. By providing
demonstrations with causal links, LCL guides the model to discern not only the
analogy but also the underlying causal associations between data points, which
empowers MLLMs to recognize unseen images and understand novel concepts more
effectively. To facilitate the evaluation of this novel approach, we introduce
the ISEKAI dataset, comprising exclusively of unseen generated image-label
pairs designed for link-context learning. Extensive experiments show that our
LCL-MLLM exhibits strong link-context learning capabilities to novel concepts
over vanilla MLLMs. Code and data will be released at
https://github.com/isekai-portal/Link-Context-Learning.Comment: 10 pages, 8 figure
- …